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Abstract 

The publication 1 suffers from several caveats: i) The method is based upon the false 
assumption that the median of x 2 distributed random variables is x 2 distributed, ii) The 
information contained in the data is not fully used, iii) It is not clear how the uncertainties 
associated to the fitted parameters can be evaluated. A correct solution of the problem is 
presented and results of Ref. [I] are compared to results obtained using the approach described 
in our textbook [2J. Finally, we correct false statements in pQ about a section in our book. 

1 Introduction 

The determination of parameters of a theory from experimental data that are distorted by exper- 
imental effects is a relatively simple problem. This is why the solution of this problem had not 
been published in a referenced scientific journal so far, but last year Gagunashvili has submitted 
the cited paper. 

The problem is solved in the following way: The experimental data are compared to a Monte 
Carlo simulation in form of histograms of the observed variable x' which due to the finite ex- 
perimental resolution differs from the true variable x. A x 2 expression is formed that measures 
the statistical difference between the two histograms. The simulated histogram depends on the 
parameter 9 of interest. (9 may also represent a set of parameters.) In a least square (LS) fit the 
parameters are estimated. The effect of changing the parameters is implemented by changing the 
weights of the simulated events, w(x) — / (x\9) / f (x\9q) where x is the undistorted variable, 9 the 
parameter of interest of the p.d.f. f(x\9) and 9q its value used in the simulation. This is explained 
in more detail in Ref. [2]. 

The purpose of this comment is twofold: i) We want to point out some caveats of the treatment 
in Ref. pQ which will lead in some applications to biased results and wrong error assignments. 
A correct solution is presented, ii) We want to correct several false statements in this article in 
respect to our textbook [2] . In a numerical example we compare our results to those published in 
Ref. [I]. 

2 The x 2 approximation - multinomial versus Poisson dis- 
tribution 

To illustrate the problem, we look at a simple example: In an experiment the slope of a linear 
distribution is to be measured. A certain amount of data has been accumulated and distributed 
into histogram bins. The number of events in each bin follows a Poisson distribution the mean of 
which depends on the flux and the slope of the distribution. Usually the flux is not known and thus 
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we have two unknown parameters, the slope and the normalization of our prediction to the data. 
The likelihood principle tells us that it does not matter whether the experiment has been stopped 
after a certain running time or after a certain amount of data had been collected. This follows 
from the likelihood principle and has been shown explicitly in Ref. [3] for the maximum likelihood 
estimate (MLE). Asymptotically, in the Gaussian approximation which we use below, the least 
square (LS) expression coincides with half the negative log-likelihood function up to irrelevant 
terms and thus also the LS estimate is independent of the stopping condition. In Ref. [I] is 
assumed that the number of events is fixed and a multinomial approach is applied. This condition 
is rarely realized and does not require the multinomial treatment but the latter is correct, too. 

For a Poisson distributed number of events di in a histogram bin i and predictions tj which 
depend on parameters of interest, the distribution W(di, ..,ds) in the histogram with B bins is 
described by 

B 

w=np(di\u) (i) 

i=l 

where P(m\X) is the Poisson distribution of m with mean A. It can also be written as 

W = P(d\t)M(d 1 ,..,d B \pi,..,PB,tD (2) 

with d = Yif =1 di, t = TifU, pi — U/t and M the multinomial distribution for d events distributed 
into B bins with probabilities pi,..,ps- Relations ([1]) and ([2|) describe the same distribution. 
We are free to use either form (see also Ref. [I]), but (p} is to be preferred in our problem 
because the formulas become much simpler than with In [T] % 2 is evaluated based upon 

multinomial statistics, while in [2] Poisson statistics has been applied. Both ways are correct, but 
the multinomial approach is involved because correlations have to be handled explicitly. 

If only relative predictions are available, we have to normalize the predictions. From the 
factorization of @ we find the maximum likelihood estimate t = d which is introduced into Jl]), 
Stj = T,di = d. 

To compare B Poisson distributed numbers di to a prediction tj, we form the statistic x 2 = 
^F=i[(di — ti) 2 /ti] which follows a \ 2 distribution with B degrees of freedom (NDF) in the ap- 
proximation where for each bin i the Poisson distribution of di with mean ti can be approximated 
by a normal distribution with mean and variance ti. When the predictions ti are normalized, i.e. 
Edi = £ij, we loose one degree of freedom (1 parameter estimated) and have NDF = B — 1. If P 
parameters of interest have to be estimated, we have NDF = B — P — 1 

If we have a Monte Carlo prediction consisting of Ki events in bin i with weights Wik , we have 
ti = cYikWik where c is an overall normalization constant and we get: 

X 2 -Ex^E ( ^~ CSfcWtfc)2 . (3) 
f-f f—' cT, k w ik 

When we estimate the parameters hidden in the weights, also c is a free parameter in the 
fit and as above NDF = B — P — 1. In (J3J the relative statistical error of ti is assumed to be 

— 1/2 

small compared to d i . This covers probably more than 90 % of all cases in particle physics 
applications. 

If the statistical error of the simulation cannot be neglected, we consider the asymptotic case 
where not only the distributions of rfj but also that of Yi^Wi^ can be approximated by normal 
distributions. We form 

2 _s^ ( d i -cEkWjk) 2 (i) 

i=l i 

To obtain a \ 2 expression, the quantity af per definition has to be the variance of di — cY<kWik 
under the assumption that the prediction is correct. A reasonable approximation is obtained from 
error propagation, &1 — di + c 2 Yikwf k . This approximation is adequate in most of the remaining 
cases. A correct treatment includes a in the fit. A maximum likelihood estimate (see Appendix 
13, subsection 13.8.1 and relation (13.32) of Ref . 2 in different notation and Ref. [5]) is: 
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dt = c(d i ^a- + x k w ik ). (5) 

To derive ([5]) the expected values E(wik) and E(w 2 fe ) have been replaced by the empirical values 
^kWik/Ki and SfcW^./ifj. For convenience, the derivation of ([5]) is given in the Appendix. 

The normalization c is a free parameter in the fit. Asymptotically, the fitted normalization 
reproduces the number of observed events. 

3 Caveats, restrictions and difficulties of the approach de- 
scribed in pTjj . 

3.1 The choice of the median 

The difficulties in the approach in [1] are related to the application of the multinomial distribution 
where the correlations have to be explicitly handled. There, xf is evaluated excluding bin i for 
all B values of i and then the median is computed without any explanation and discussion of its 
properties. 

The ad hoc solution to take the median instead of the mean of many incomplete evaluations is 
used because the median is a robust estimator - an indication that something is problematic with 
this approximation, but both choices are wrong: Neither the mean nor the median of the Xndf 
distributed values is described by a X%df distribution as long as the values are not identical. For 
a typical example taken from [6] and using Relation (34) of [6], we find that the mean value of 
the medians of samples of partially correlated random variables following a Xs-2 distribution is 
typically by half a unit off the nominal value of NDF = B — 2. This bias does not disappear 
with increasing statistics. As a consequence the evaluation of parameter uncertainties from the 
curvature of the x 2 curve near its maximum is doubtful and p- values derived from a x 2 test are 
wrong. The fact that in specific examples the point estimate is sensible does not validate the 
proposed procedure. 

3.2 The loss of a constraint 

Comparing a prediction of a histogram to a histogram of observed events where the prediction is to 
be normalized to the data, we have NDF = B — 1 — P degrees of freedom, where P is the number 
of additional parameters of interest. In [T] due to the difficulties created by the correlations, one 
additional degree of freedom is given away, NDF — B — 2 — P. As a consequence, precision is lost 
and, for example, an asymmetry in a two-bin histogram cannot be determined. 

3.3 The weight restriction 

The weights in pQ are restricted to functions of the histogrammed variable. This condition is 
violated in unfolding problems. The weight restriction seems not to be necessary for unnormalized 
weights but this point should be clarified officially by the author. 

3.4 The error treatment 

In fJjj it is not explained how the parameter errors are obtained in the fitting procedure. A parameter 
fit without error assignment is useless. The reader will probably assume that the errors are obtained 
in the standard way from the variation of \ 2 as a function of the parameter (at some point MINUIT 
errors are quoted). However, the standard error estimation can be wrong independent of the fact 
that in the evaluation of the x 2 statistic the errors are correctly implemented as a function of the 
parameter values. The reason is explained in our book: It is related to the fact that due to the 
weighting the denominators of the \ 2 expression may have a sizable dependence of the parameters. 
The effect is small in most applications but can be large, if histograms with a large number of 
bins and large smearing are fitted. Therefore, in situations where the uncertainty of the simulated 
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numbers cannot be neglected, the validity of the error assignment has to be checked, for instance 
by changing the amount of Monte Carlo events. 

In addition to the problems related to weighting, the error handling in least square fits where 
crude approximations of the x 2 statistic are used should have been discussed. The effects are 
especially important with low event numbers. 

3.5 Problems with small event numbers 

In many experiments the number of events is so small that the application of a least square fit 
is problematic. In [1 no solution for this situation is presented. However in this case enough 
Monte Carlo events can be generated such that their statistical error is negligible and a Poisson 
likelihood fit can be performed as explained in Ref. [2j [5] . 

3.6 Technical difficulties 

In the approach of [1] a parameter fit with a histogram of B bins requires for each change of the 
parameter during the minimum search B additional fits of an auxiliary parameter. For a two- 
dimensional histogram with 20 x 20 bins and 1000 minimum searching steps in the fit this means 
that 400,000 auxiliary parameters have to be estimated in LS fits. 

4 Inflicting statements about the content of our book 

In Section 1 of Ref. pp figures the following paragraph: 

"In [2], a re- weighting procedure for fitting a Monte Carlo reconstructed distribution to the 
reconstructed data was proposed. The procedure is presented rather sketchily, and cannot be 
repeated even for the example that was used in [2] for illustration. There is not a clear explanation 
of how the parameters and the errors in them were calculated. The authors of [2] stated without 
proof, that the statistic used for the fitting of the parameters had a x 2 distribution but did not 
define the number of degrees of freedom. This makes it impossible to use this statistic for choosing 
the best model from a set of alternative models." 

The reference was our book Ref. [2]. 

These statements are false: 

- Two methods are proposed. The likelihood ratio solution is not mentioned in Ref. [T|. There 
are two examples. May be, the explanations were a bit short but certainly understandable in 
the context of preceding chapters of our textbook. We would have been glad to furnish further 
explanations to Gagunashvili. Our method is considerably simpler than that proposed in Ref. pp , 
certainly not less precise and we are convinced that it is easier to understand than that presented 
in Ref. [J. 

- Contrary to the statement of Ref. pp, in the two examples no parameter estimates and 
errors were quoted. The section the author of Ref. pp refers to is Section 6.5.9, "Comparison of 
Observations with a Monte Carlo Simulation" . It is part of a chapter on parameter inference in 
which it is explained in detail how parameters and their errors are estimated in least square and 
likelihood fits (see Sections 6.5.3 and 6.5.5, subsection x 2 approximation). A subsequent chapter 
on interval estimation discusses error assignments even in more detail. 

- The author apparently refers to formula (6.17) in different notation of Section 6.5.9 of our 
textbook: 



where di was as the number of experimental events in bin i, cnii the Monte Carlo prediction with 
c a Monte Carlo normalization constant and rtii the corresponding sum of weights in bin i, B was 
the number of bins. It was stated that the formula is valid if the statistical uncertainty of mi 
is negligible. The formula was derived and explained in the section "x 2 approximation" where 
X 2 for a histogram with Poisson distributed numbers was discussed. The NDF are irrelevant for 
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parameter estimation. (The NDF has to be known in goodness-of-fit tests which is a different 
subject and which is treated in a subsequent chapter of our book. Independent of this fact, the 
number of degrees of freedom (NDF) for y 2 statistics depending on fitted parameters were defined 
in Chapter 3 and therefore are known to the reader. The relation needed to apply a goodness-of-fit 
test for weighted histograms (13.32) is given in the Appendix 13 where also the NDF are defined. 
The relation needed for parameter estimation in the case that the uncertainty of the simulation 
has to be taken into account is given in the subsection 13.8.4 but the statements in [1 referred to 
the main text.) 

5 Example 

To perform a quantitative comparison of our approach to that of Ref. pQ, we have applied our 
method to the first example given in Ref. pQ which was also used in our book to illustrate our 
method. The slope of a linear distribution is to be adjusted. The p.d.f. is f(x\a) = (1 + ax) /(l + 
a/2), with x S [0, 1] and a > — 1. For the "experimental" events the slope parameter was a = 1 
and for the Monte Carlo events a m = 0. Events were generated in the interval [0, 1]. The variable 
x was smeared with a Gaussian resolution of a = 0.3. The smeared distribution was subdivided 
into 5 or 20 bins of equal width in the range between —0.3 and 1.3. The number of experimental 
and simulated events was 500, 5000 and 50000. Each case was simulated 10, 000 times. Some 
choices required to take the error of the simulation into account. We applied formula (13.32) of 
Ref. [2J which corresponds to Eq. j5J. The slope parameter is hidden in the weights Wk- In the 
least square fit we included only bins with more than 5 events. Not all combinations quoted in 
Ref. pQ were repeated. 

We compare our results to those of Ref. pQ in Table 1. The results of Ref. [I] are quoted in 
parenthesis. In the lines denoted by "+" and "— " some kind of estimates of the positive and neg- 
ative errors as defined in Ref. pQ are given. In addition to the mean of the slope parameter which 
has the nominal value one, we quote the root mean square deviation (r.m.s.) of the distribution 
of the fitted slope parameter. For high statistics our and the results of Ref. pQ agree, for small 
event numbers we observe mostly a smaller bias and obtain smaller errors. The results for the 
specific case with 5 bins, 500 observed and 500 simulated events are unstable because arbitrarily 
large parameter values occur if the number of experiments is continuously increased. Also other 
cases with small observed or simulated event numbers may suffer from some rare cases where 
large slopes are found. Therefore we are not sure that all differences that we observe are really 
significant. Anyway, least square fits based on a % 2 approximation are problematic with such low 
event numbers. For physics applications only the last column of the table with 20 bins and 50000 
simulated events is relevant. 

We have also applied a maximum likelihood fit. The results were similar to those from the 
least square fit. 

6 Conclusions 

The comparison of histograms of statistical data with a Monte Carlo prediction should be treated 
in the framework of Poisson statistics. The correlation of the event numbers in the different bins 
can be taken into account by the normalization of the data to the prediction in the fit. This 
approach is considerably simpler than the method of Ref. pQ which starts from a multinomial 
distribution. 

The treatment of Ref. [T] is based on the false assumption that the median of a sample of 
X 2 distributed random variables also follows a \ 2 distribution with the same number of degrees 
of freedom. As a consequence, error estimates based on the corresponding statistic are wrong. 
Furthermore the result does not exploit the full information of the data in that it gives away one 
degree of freedom. Small event number cannot be handled and there is no error treatment. In 
the quantitative comparison of the point estimates of the two approaches in a special example 
published in Ref. pQ, the results are found to be similar but they are slightly more precise in our 
method. 
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Table 1: Results from fitting the linear slope parameter a (with nominal value unity, see text) 
with our method compared with the results from [1] (in parenthesis) for various sample sizes and 
bin numbers. Besides mean and root mean square (rms) values some positive and negative +, — 



error estimates as defined in [T 



are given. 



# data 




5 bins 


5 bins 


20 bins 


20 bins 


500 


# MC 


500 


50000 


500 


50000 


mean 
+ 

rms 


1.25 (1.29) 
3.37 (3.13) 
0.60 (0.66) 
1.15 


1.11 (1.13) 
0.91 (0.81) 
0.46 (0.54) 
0.65 


1.10 (1.17) 
1.65 (1.84) 
0.59 (0.61) 
1.14 


1.00 (1.07) 
0.72 (0.79) 
0.43 (0.45) 
0.55 


5000 


mean 

+ 

rms 


1.12 (1.12) 
0.87 (0.93) 
0.48 (0.52) 
0.66 


1.01 (1.01) 
0.21 (0.23) 
0.16 (0.17) 
0.19 


1.11 (1.11) 
0.89 (0.91) 
0.47 (0.47) 
0.64 


1.01 (1.00) 
0.20 (0.20) 
0.15 (0.16) 
0.18 


50000 


mean 
+ 

rms 


1.10 (1.10) 
0.86 (0.87) 
0.46 (0.49) 
0.63 


1.00 (1.00) 
0.08 (0.09) 
0.07 (0.08) 
0.08 


1.11 (1.10) 
0.77 (0.83) 
0.46 (0.45) 
0.62 


1.00 (1.00) 
0.08 (0.08) 
0.07 (0.07) 
0.07 



We reject the false assertions made in Ref. pQ with respect to our book. 

7 Appendix: Proof of the relation ([5]) 

We prove relation ((SJ) for a single bin and drop the bin index. We consider the quantity d — cKw, 
where d and K are Poisson distributed, w is the mean value of K weights and c is a normalization 
constant common to all bins. In the limit where d, K approach infinity, the statistic 

X c(dE(w 2 )/E(w) +E(K)E(w)) K ' 

is x 2 distributed with one degree of freedom. Equivalently, \px 2 is normally distributed with 
variance equal to one. 

Proof: 

We set 

t = Kw , t = tE(w)/E(w 2 ), c = cE{w 2 )/E(w) . (8) 

Relation jTJ) now reads 

2 _ ( d - ct) 2 

X ~ c{dE(w 2 )/E(w) +E(t)) ' 

According to the central limit theorem, d, t and t are asymptotically normally distributed. Fur- 
thermore we have Var(t) = E(t) which is typical for the Poisson distribution. As asymptotically 
the Poisson distribution approaches the normal distribution, we are allowed to use in the following 
the Poisson approximation of the distribution of t instead of the normal distribution. (Without 
proof we claim that the Poisson approximation is closer to the true distribution than the normal 
approximation. Examples can be found in Ref. [2].) 
The variance a 2 of d — ct is 

a 2 = vax(d) + c 2 var(£) . 

Under the hypothesis that our description is correct, d and t should follow related Poisson 
distributions with mean A and A/c, respectively and 

a 2 = A(l + c) . 
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We fix A to its maximum likelihood estimate from the log-likelihood function derived from the 
product of the two Poisson likelihoods: 

In L = d In A - A + t In - - - . 

c c 

Determining A as usual from the root of dhiL/dX and re-substituting c using ([8]) we get 

X =I ^-_(d + i) (9a) 

and with 

a 1 = c(d + t) = c(dE(w 2 )/E{iv) +E(i)) 

the assertion ([7]). 

We replace the expected values by their empirical estimates and obtain (|5|). 

Remark 1: A different and more complicated estimate of a is obtained if we use the normal 
approximations for the distributions of d and t, but in the asymptotic limit the different approxi- 
mations coincide. Therefore our result does not depend on the use of the Poisson approximation 
for the distribution of the weighted sum. 

Remark 2: If t(9) depends on a parameter 9 that has to be estimated, the parameter A and in 
principle also E(w),E(w 2 ) are nuisance parameters. Estimating them out is correct for the point 
estimate of 9. For the interval estimate the three nuisance estimates depend on 9. Eliminating 
them with Rel. (|9a[) and taking the empirical means corresponds to a profile likelihood treatment 
of the error estimate of 9. 
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